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1. INTRODUCTION 

Ambiguous sentences are sentences that have more than one meaning. Ambiguous sentences are 
divided into 3 types, namely phonetic, lexical, and grammatical. This research will focus on grammatical 
ambiguity. Grammatical ambiguity occurs due to incorrect grammar usage. However, this ambiguity would 
disappear once it is used within a sentence [1-4]. In Indonesian, the unability to understand ambiguous 
sentences often occurs due to different levels of language use, different levels of education, and 
culture [5]. Ambiguous word is a word hat has a vague (unclear) nature, in Indonesian, there are a number of 
grammatical ambiguous words such as "bulan (moon/ month)". “Bulan” has two meanings, the first meaning 
is "an astronomical object orbitting the earth", and the second meaning means "a period of time" [6, 7]. Grammatical 
ambiguous sentences would not pose a big problem when used in direct conversation, direct dialogue between 
humans, and sentences read by humans [8]. Because humans have intelligence that can process, and absorb 
ambiguous words in accordance with the topic of conversation, and words related to the ambiguous sentence. 
This is very different from computers, computers do not have the intelligence to detect ambiguous sentences. 
By using the grammatical ambiguous sentence detection system, the system is able to find out the meaning of 
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an ambiguous sentence, and translate it according to the meaning [9]. This system is aimed to enable computer 
to understand ambiguous sentences in Indonesian properly. 

Research on grammatical ambiguous sentences has not been widely developed, especially regarding 
the detection of Indonesian grammatical ambiguous sentences. Currently, researches related to ambiguous 
sentences were only able to find ambiguous sentences but were not able to understand the meaning of 
ambiguous sentences [10]. So far, the data sets covering Indonesian ambiguous grammatical sentences are still 
not available yet. While the availability of grammatical ambiguous sentences detector is highly needed. Fro 
instance, in order to improve the accuracy of a translator system, and to make it easier for computers to 
understand a text. So, in this research, the expected novelty that will be achieved is to create a grammatical 
ambiguous sentence detection system in Indonesian, using the Boyer-Moore algorithm 


2. RESEARCH METHOD 

Figure 1 explains the processes involved in the ambiguous sentence detection system using 
Boyer-Moore algorithm. This flowchart explains the sentence being entered, then the sentence is checked using 
Boyer-Moore algorithm, so that it can be selected wether the sentence contains any ambiguous words. 
If the sentence is stated to contain ambiguous words, then the meaning of the sentencewould be searched using 
Cosine Similarity method. Several steps are needed to build this research; the following is the research method used. 


Sentence Ambiguou 
input f s sentence 
Ambiguous Word 
Search with 
Boyer-Moore 
Cosine Similarity 


Not arn 
ambiguous 
sentence 


Ambiguous 
sentence data 
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Figure 1. Flowchart detection of ambiguous Indonesian sentences with Boyer-Moore algorithm 


2.1. Sentence input 

Sentence input consists of sentences which are still unknown whether it contains ambiguity. 
The sentences are conversational sentences in Indonesian. In Indonesian, there are several types of ambiguous 
sentences, namely grammatical, lexical, and phonetic [11]. This research will focus on grammatical ambiguity. 
Grammatical ambiguous sentences are ambiguous sentences that occur due to incorrect grammar use, but this 
ambiguity will disappear once it is used in a sentence. The following are examples of ambiguous sentences 


“Setiap awal bulan kami gajian (We are paid at the beginning o feach month) ” 


The sentence above contains an ambiguous word that is "bulan (month)", the word “bulan" has two meanings, 
which are; 

- Bulan (month) = a period of time 

- Bulan (moon) = sky object 
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In Figure 2, it is explained that the word “bulan” has two meanings, in which one refers to a particular unit of 
time (month), and the “bulan” which shows the earth's satellites (moon--an object in the sky). At this stage 
the meaning of the word is unknown. The following is a simple description of an ambiguous word. 


Figure 2. Grammatical ambiguous word description 


2.2. Ambiguous word search using Boyer-Moore 

Boyer-Moore algorithm is an algorithm used for string searching [12-20]. In conducting string 
searching, the Boyer-Moore algorithm is highly accurate. Following are the steps conducted by Boyer-Moore 
algorithm to find ambiguous sentences. 


2.2.1. 1% step 

Figure 3 explains the process of searching for the ambiguous word "bu/an" in the sentence "setiap 
awal bulan kami gajian (at the beginning of each month we are paid)." This search is carried out from the first 
string, the search is carried out from the left side to the right side. If the word has not been found, the search 
would be repeated again, starting with the second string. 


Input sentence 


Sia strings with Boyer Moore 





Figure 3. Ambiguous word search step 1 


2.2.2. 2"? step 

Figure 4 explains the process of searching for the ambiguous word "bulan" in the sentence "setiap 
awal bulan kami gajian (at the beginning of each month we are paid)". This process is a continuation of 
the first process, the search string starts from the second string. 


Sentence input 


s je jt ji ja |p | ja |w ja [i | fb ju |I fa |n | fk fa |mļi | fe fa fi fi jajn] 


Pe eee 


Matching strings with Boyer Moore 





Figure 4. Ambiguous word search step 2 


2.2.3. 13" step 

Figure 5 shows that the process of searching for the ambiguous word “bulan” in the sentence "setiap 
awal bulan kami gajian (at the beginning of each month we are paid)" has been successful. The word "moon" 
is found on the 13th process, the word “bulan” was found in the 13th string. On the 13th step, a grammatical 
ambiguous word was found; the word is the word “bulan”. In this study, the Boyer-Moore algorithm is used 
to check strings. Inputs (sentences that are not yet known to be grammatically ambiguous) are being matched 
with data sets of words that have been identified as grammatical ambiguous. At present, the number of data 
sets that can be stored is only 50; this happens because there are no researchers who have developed 
applications related to grammatical ambiguous sentences in Indonesian. 
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Sentence input 


seft ji fa [p | fa [wa [i | [bju |i ja |n| |kfa]ļmji| [a fa jj |i jafn 
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Matching strings with Boyer Moore 





Figure 5. Ambiguous word search step 13 


Figure 6 explains the Flowchart where the ambiguous word in a sentence is searched, on 
the Flowchart it is shown that if the ambiguous word has not been found, then a search is carried out on 
the next string, until the word is found or declared to be missing. If the word is found from the beginning, 
the system will immediately be terminated and it can be decided that the ambiguous word exists. The following 
is a flowchart describing the Boyer-Moore string searching process. 
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Sentences 
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Figure 6. String searching flowchart using boyer-moore algorithm 


2.3. Ambiguous sentences data SET 

Grammatical ambiguous sentence dataset is a collection of ambiguous words and sentences used as 
a benchmark [21-23]. Since up to this stage, there was no ambiguous sentences found, this research has 
collected data on ambiguous words and sentences from Indonesian linguists. In this research, the resource 
person is an Indonesian language lecturer, Encil Puspitoningrum, M. Pd. The following is a table of ambiguous 
words and sentences obtained from her. Table 1 consists of 3 rows, line 1 is "Ambiguous Words" which 
contains the list of ambiguous words. Line 2 "Sentences" contains sentences that usually use ambiguous words. 
Line 3 is the “Meaning” ean which contains the meaning of the ambiguous sentences. 


Table 1. Ambiguous words and sentences 


Ambiguous words Sentence Meaning 
Budi (Mind) Aku mengenang budi baikmu (I remember your kindness) Kebaikan (Kindness) 
Gus kamu kemarin mendapatkan salam dari anggi (Gus, Anggi sent you ! 
Salam (Regards) e yesterday) Sapaan (Greetings) 
Tahu (Tofu) Agus kesini tadi memberi tahu (Agus came here to give us tofu) Makanan (Food) 


Bunga deposito di bank jatim 


a lumayan tinggi (The deposit interest rate in Bank Jatim is quite high) 


Keuntungan (Profit) 


Bangku (Bench) Dia tidak pernah makan bangku sekolah (He never went to school) Pendidikan (Education) 
l ej ; : l l Melakukan pekerjaan 

Kemas (Organized) Acara ini dikemas dengan sangat baik (This event is very well organized) Daemon. 

Bulan (Month) Awal Bulan Kamu gajian (You are paid at the beinning of the month) Waktu (Time) 
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2.4. Cosine similarity 

The sentences inputted are calculated in terms of their resemblance to the sentences in the data set. 
In order to calculate similarities between sentences, cosine similarity is used [24-28]. In this research, 
the sentence to be used as input is "setiap awal bulan kami gajian". The sentence has been identified to contain 
a grammatical ambiguous word, "bulan (month)". Sentences related to "bulan (month)" are: 

- Awal bulan kami gajian (We are paid at the beginning of the month). 
- Bumi dan bulan merupakan benda langit (The Earth and The Moon are sky objects). 

By using cosine similarity, we can find the similarity of a string. After the string similarity is known, 
the system is able to show the meaning of the sentence. The following is the cosine similarity algorithm 
processing. In a more detailed explanation, the example used is the closeness between "awal bulan kami 
gajian” (We are paid at the beginning of each month) and “awal bulan kamu gajian” (You are paid at 
the beginning of each month) ". 

- S1 = awal bulan kami gajian (We are apid at the beginning of the month). 

- S2 = awal bulan kamu gajian (You are paid at the beginning of the month). 

Table 2 explains the existence of each word in a sentence. If the word is contained in the sentence, code 1 will 
be given in line A. Conversely, code 0 will be given when the word is not found in the sentence. 


Table 2. Ambiguous words and sentences 


Count 

or A B A.B A? B? 

Awal (beginning) 1 l l l l 
Bulan (month) l l l l l 
Kami (we) l 0 0 l 0 
Kamu (you) 0 i| 0 0 i| 
Gajian (paid) l 0 0 i| 0 
2 4 3 


Cosine Similarity is a method used to calculate the degree of similarity between two objects. 
For the purpose of data clustering, a good function is the Cosine Similarity function. For the set notation 
the formula as shown in (1): 


Similarity = cos(@) = TET (1) 
2 


= (4x3) 


= 0.166 





After being calculated using the Cosine Similarity method, the highest closeness is 0.166. More detailed 
explanation is shown in Table 3. In Table 3 two values appear, which are 0.16 and 0.05. Given the high 
similarity value of the sentences “setiap awal bulan kami gajian” and “setiap awal bulan kamu gajian”. It can 
be condluded that the word “bulan” in the sentence means “a period of time”. 


Table 3. The results of anayzing the meaning of sentences using confusion matrix method 


Id Input Sentences Data Set ve or 
Similarity 
1l Setiap awal bulan kami gajian (We are Setiap awal bulan kamu gajian (You are paid at the 0.16 
paid at the beginning of each month) beginning of each month) ' 
2 Setiap awal bulan kami gajian (We are Bumi dan bulan merupakan benda langit (Earth and moon 
. ee . 0.05 
paid at the beginning of each month) are sky objects) 


2.5. Determining the meaning of sentences when beingprocessed in the program 

From the Boyer-Moore Algorithm and Cosine Similarity processes some results are obtained [29, 30]. 
These results stated that “bulan” in the input sentence means a period of time (month). Following are the results 
obtained, the results are also implemented on the web. Figure 7 discusses the results of calculations performed 
by usingCosine Similiarity method. At this stage the sentence “Setiap awal bulan kami gajian (we are paid 
at the beginning of each week) “ had been tested for the smiliarity with the sentence "Setiap awal bulan kamu 
gajian (you are paid at the beginning of each month) and "Bumi dan Bulan merupakan benda langit (The Earth 
and The Moon are sky objects)". After being calculated using Cosine Similarity method, the sentence has been 
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proven to have closeness in meaning with the sentence "Setiap awal bulan kami gajian (We are paid 
at the beginning of each month) " this sentence contais word ”bulan” which means a period of time (month) 
with the value of similarity +of 0.16. 


Ambiguous sentence detection results 


cosine similarity 
a string —_ l : 
[ous 


W [Awal bulan kamu | bulan kamu 


[0.16 
Bulan Bumi bulan benda 
=— = | 


Figure 7. The results of detecting grammatical ambiguous sentences using cosine similarity 








3. RESULTS AND ANALYSIS 
3.1. Accuracy, precision, recall and F-measure test 

At this stage, the system is tested using a confusion matrix, which is often used to find out precision, 
accuracy, and recall [31-33]. With the confusion matrix, it can be seen how well the system is able to understand 
grammatically ambiguous sentences. This system experiment has been carried out 200 times. While there are 
50 words in the database which are ambiguous words, this word is called True Positive (TP). When separating 
ambiguous words there is also an error, which is an unambiguous word but an ambiguous word is captured, 
this word is called false positive (FP). In some cases, there are ambiguous words but cannot be recognized by 
the system, this word is called false negative (FN). Whereas words that are not ambiguous are called true 
negatives (TN). The calculations can be seen in Table 4. 


Table 4. Confusion matrix value 


TP=40 FP=3 
FN=10 TN=147 
Accuracy = =A _ = 0,935 (2) 
404+147+3+4+10 
7 40 
Precision = —— = 0.9302 (3) 
40+3 
Recall —— = 0.8 (4) 
40+3 


The value scale of matrix confusion ranges from 0-1. From the above calculation it is obtained 
the value of accuracy which is 0.935, Precision is 0.9320, and Recall is 0.8. Judging from the recall value, 
the system is able to recognize ambiguous words as much as 80%. Meanwhile, the lack of data sets has made 
the system unable to recognize ambiguous words. F-Measure is one of the evaluation calculations method in 
retrieving information that combines recall and precision. The values of recall and precision in a situation might 
bear different weights. The measurement that displays the reciprocity between recall and precision is 
the F-Measure, which is the weight of the harmonic mean of the Recall and Precision. The f-measure range is 
between 0-1. From the above calculation, the F-measure value is 0.86. 

Pie oe ~ = 0.8601 (5) 

0.9302+0,8 17 
3.2. The speed in detecting ambiguous words 

In understanding grammatical ambiguous sentences, the system requires different time to process each 
sentence; the processing of this sentence depends on the number of characters understood [34]. The average 
sentence search value is 0.003275. There is a need for speed calculations to analyze system performance. 
Ambiguous sentence detection speed is presented in Table 5. The highest speed in this speed detector is 0.0024 
to detect the sentence "Dia bagai kuda hitam (He is like a dark horse)". While the lowest speed is 0.0042 
in the sentence "Acara ini dikemas dengan sangat baik (This event is very well organized)”. The following 
table shows the rate of speed in detecting ambiguous words: 
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Table 5. Speed in detecting ambiguous words 


Ambiguous words Sentences Speed 

Kemas (organized) Acara ini dikemas dengan sangat baik (This event is very well 0.0042 
organized) 

Budi (Mind) Aku mengenang budi baikmu Aku mengenang budi baikmu 0.0036 
(I rememberi your kindness) 

Salam (Regards) Gus kamu kemarin mendapatkan salam dari anggi (Gus, Anggi sent 0.0039 
you regards yesterday) 

Tahu (Tofu) Agus kesini tadi memberi tahu (Agus came here to give us tofu) 0.0039 

Bunga (Interest) Bunga deposito di bank jatim lumayan tinggi (The deposit interest 0.0037 
rate in Bank Jatim is quite high) 

Bangku (Bench) Dia tidak pernah makan bangku sekolah (he never went to school) 0.0043 

Kuda (Horse) Dia bagai kuda hitam (He is like a dark horse) 0.0024 

4. CONCLUSION 


Grammatical ambiguous sentences in Indonesian are sentences that have two meanings. To recognize 


ambiguous sentences, we need a Boyer-Moore algorithm and cosine similarity algorithm. Boyer-Moore 
algorithm is used to find strings (ambiguous sentences). While the cosine similarity algorithm is used to 
calculate the degree of similarity between two objects. Cosine similarity can be used to find out the meaning 
of a sentence, by calculating the similarity of the test data to the data set. The Boyer-Moore algorithm and 
the Cosine similarity algorithm are very effective for detecting ambiguous words. This can be proven by 
the success rate of the system in retrieving information (recall) of 80%. While the average speed of 
the Boyer-Moore algorithm when detecting ambiguous sentences takes 0.003275 seconds. 
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